A Hybrid Word-Character Model for Abstractive Summarization

نویسندگان

  • Chieh-Teng Chang
  • Chi-Chia Huang
  • Jane Yung-jen Hsu
چکیده

Abstractive summarization is the popular research topic nowadays. Due to the difference in language property, Chinese summarization also gains lots of attention. Most of studies use character-based representation instead of word-based to keep out the error introduced by word segmentation and OOV problem. However, we believe that word-based representation can capture the semantics of the articles more accurately. We proposed a hybrid word-character model preserves the advantage of both word-based and characterbased representations. Our method also enables us to use larger word vocabulary size than anyone else. We call this new method HWC (Hybrid Word-Character). We conduct the experiments on LCSTS Chinese summarization dataset, and outperform the current state-of-the-art by at least 8 ROUGE points.ive summarization is the popular research topic nowadays. Due to the difference in language property, Chinese summarization also gains lots of attention. Most of studies use character-based representation instead of word-based to keep out the error introduced by word segmentation and OOV problem. However, we believe that word-based representation can capture the semantics of the articles more accurately. We proposed a hybrid word-character model preserves the advantage of both word-based and characterbased representations. Our method also enables us to use larger word vocabulary size than anyone else. We call this new method HWC (Hybrid Word-Character). We conduct the experiments on LCSTS Chinese summarization dataset, and outperform the current state-of-the-art by at least 8 ROUGE points.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Approach to Multi-document Summarization of Opinions in Reviews

We present a hybrid method to generate summaries of product and services reviews by combining natural language generation and salient sentence selection techniques. Our system, STARLET-H, receives as input textual reviews with associated rated topics, and produces as output a natural language document summarizing the opinions expressed in the reviews. STARLET-H operates as a hybrid abstractive/...

متن کامل

TL;DR: Improving Abstractive Summarization Using LSTMs

Traditionally, summarization has been approached through extractive methods. However, they have produced limited results. More recently, neural sequence-tosequence models for abstractive text summarization have shown more promise, although the task still proves to be challenging. In this paper, we explore current state-of-the-art architectures and reimplement them from scratch. We begin with a ...

متن کامل

Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression

Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other ...

متن کامل

A Neural Attention Model for Abstractive Sentence Summarization

Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstractive sentence summarization. Our method utilizes a local attention-based model that generates each word of the summary conditioned on the input sentence. While the model is structurally simple, it...

متن کامل

Automatic Community Creation for Abstractive Spoken Conversations Summarization

Summarization of spoken conversations is a challenging task, since it requires deep understanding of dialogs. Abstractive summarization techniques rely on linking the summary sentences to sets of original conversation sentences, i.e. communities. Unfortunately, such linking information is rarely available or requires trained annotators. We propose and experiment automatic community creation usi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1802.09968  شماره 

صفحات  -

تاریخ انتشار 2018